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Motivated by the problem of testing tetrad constraints in factor 
analysis, we study the large-sample distribution of Wald statistics 
at parameter points at which the gradient of the tested constraint 
vanishes. When based on an asymptotically normal estimator, the 
Wald statistic converges to a rational function of a normal random 
vector. The rational function is determined by a homogeneous poly- 
pi- ■ normal and a covariance matrix. For quadratic forms and bivariate 
^\J ' monomials of arbitrary degree, we show unexpected relationships to 

chi-square distributions that explain conservative behavior of certain 
Wald tests. For general monomials, we offer a conjecture according 
to which the reciprocal of a certain quadratic form in the reciprocals 
of dependent normal random variables is chi-square distributed. 



1. Introduction. Let / G R[xi, . . . ,Xfc] be a homogeneous £;-variate 
polynomial with gradient V/, and let X be a k x k positive semidefinite 
matrix with positive diagonal entries. In this paper, we study the distribution 
of the random variable 

^O ' f{X) 2 

? ; (L1) Wf '* = (v/pOFsv/po' 

where X ~ A4(0, S) is a normal random vector with zero mean and co- 
f— v I variance matrix X. The random variable Wf^, arises in the description of 

CO ' the large-sample behavior of Wald tests with E being the asymptotic co- 

variance matrix of an estimator and the polynomial / appearing in a Taylor 
approximation to the function that defines the constraint to be tested. 
KA ' In regular settings, the Wald statistic for a single constraint converges to 

H ■ Xi> the chi-square distribution with one degree of freedom. This familiar fact 

is recovered when f(x) = a T x, a ^ 0, is a linear form and 

a T X 



(1.2) W/, E 



va T Ha 



becomes the square of a standard normal random variable; the vector a 
corresponds to a nonzero gradient of the tested constraint. Our attention is 
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2 M. DRTON AND H. XIAO 

devoted to cases in which / has degree two or larger. These singular cases 
occur when the gradient of the constraint is zero at the true parameter. 

For likelihood ratio tests, a large body of literature starting with Chernoff 
(1954) describes large-sample behavior in irregular settings; examples of re- 
cent work are Aza'is, Gassiat and Mercadier (2006), Drton (2009), Kato and Kuriki 
(2013) and Ritz and Skovgaard (2005). In contrast, much less work appears 
to exist for Wald tests. Three examples we are aware of are Glonek (1993), 
Gaffke, Steyer and von Davier (1999) and Gaffke, Heiligers and Offmger (2002) 
who treat singular hypotheses that correspond to collapsility of contingency 
tables and confounding in regression. Our own interest is motivated by the 
fact that graphical models with hidden variables are singular (Drton, Sturmfels and Sullivant, 
2009, Chap. 4). 

In graphical modeling, or more specifically in factor analysis, the testing 
of so-called 'tetrad constraints' is a problem of particular practical relevance 
(Bollen, Lennox and Dahly, 2009; Bollen and Ting, 2000; Hipp and Bollen, 
2003; Silva et al., 2006; Spirtes, Glymour and Scheines, 2000). This problem 
goes back to Spearman (1904); for some of the history see Harman (1976). 
The desire to better understand the Wald statistic for a tetrad was the 
initial statistical motivation for this work. We solve the tetrad problem in 
Section 5; the relevant polynomial is quadratic, namely, f(x) = X1X2 — X3X4. 
However, many other hypotheses are of interest in graphical modeling and 
beyond (Drton, Sturmfels and Sullivant, 2007; Drton, Massam and Olkin, 
2008; Sullivant, Talaska and Draisma, 2010; Zwiernik and Smith, 2012). In 
principle, any homogeneous polynomial / could arise in the description of 
a large-sample limit and, thus, general distribution theory for the random 
variable H 7 /,^ from (1.1) would be desirable. 

At first sight, it may seem as if not much concrete can be said about 
Wf s when / has degree two or larger. However, the distribution of W/e 
can in surprising ways be independent of the covariance matrix £ even if 
degree(/) > 2. Glonek (1993) was the first to shows this in his study of 
the case f(x) = x\X2 that is relevant, in particular, for hypotheses that are 
the union of two sets. Moreover, the asymptotic distribution in this case 
is smaller than x\i making the Wald test maintain (at times quite conser- 
vatively) a desired asymptotic level across the entire null hypothesis. We 
will show that similar phenomena hold also in degree higher than two; see 
Section 3 that treats monomials f(x) = x^x^ 2 . For the tetrad, conser- 
vativeness has been remarked upon in work such as Johnson and Bodner 
(2007). According to our work in Section 5, this is due to the singular na- 
ture of the hypothesis rather than effects of too small a sample size. We 
remark that in singular settings standard n-out-of-n bootstrap tests may 
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fail to achieve a desired asymptotic size, requiring the need to consider m- 
out-of-n and subsampling procedures; compare the discussion and references 
in Drton and Williams (2011). 

In the remainder of this paper we first clarify the connection between Wald 
tests and the random variables Wf^ from (1.1); see Section 2. Bivariate 
monomials / of arbitrary degree are the topic of Section 3. Quadratic forms 
/ are treated in Section 4, which gives a full classification of the bivariate 
case. The tetrad is studied in Section 5. Our proofs make heavy use of the 
polar coordinate representation of a pair of independent standard normal 
random variables and, unfortunately, we have so far not been able to prove 
the following conjecture, which we discuss further in Section 6. 

Conjecture 1.1. Let £ be any positive semidefinite k x k matrix with 
positive diagonal entries. If f(x\, . . . , x k ) = x^x^ 2 • • • x^ k with nonnegative 
real exponents ai, . . . , a k that are not all zero, then 

(aH \-a k ) 2 

It is not difficult to show that the conjecture holds when £ is diagonal. 

Proof under independence. Let Z be a standard normal random vari- 
able, and q > 0. Then a 2 /Z 2 follows the one-sided stable distribution of 
index ^ with parameter a, which has the density 

(1.3) p«(x) = -%=-^= e~&'*, x>0. 

V27T VX J 



The law in (1.3) is the distribution of the first passage time of a Brownian 
motion to the level a (Feller, 1966). Hence, it has the convolution rule 

(1.4) p a *Pi3 =p a+l3 a,/3>0. 

When f(x) = x" 1 ■ ■ ■ x^ k and E = (cry) is diagonal with an, . . . ,Ukk > 0, 
then 

1 _ a\o lx a 2 k <Jkk 



w f ,z X 2 x 2 ■ 

By (1.4), the distribution of l/W/,s is («H h a k ) 2 /Z 2 . Therefore, 

(aH \-a k y 

as claimed in Conjecture 1.1. □ 
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The preceding argument can be traced back to Shepp (1964); see also 
DasGupta and Shepp (2004). However, if X is a dependent random vector, 
the argument no longer applies. The case k = 2, ot\ = «2 = 1 an d £ arbitrary 
was proved by Glonek (1993); see Theorem 2.3 below. We prove the general 
statement for k = 2 as Theorem 3.1. 

2. Wald tests. To make the connection between Wald tests and the 
random variables W/,e from (1.1) explicit, suppose that 9 G R. k is a param- 
eter of a statistical model and that, based on a sample of size n, we wish to 
test the hypothesis 

(2.1) H : 7(0) = versus H x : j(9) / 

for a continuously differentiable function 7 : R — >■ R. Suppose further that 
there is a -^/n-consistent estimator 9 of 9 such that, as n — > 00, we have the 
convergence in distribution 

y/^0-0) A A4.(0, £(#)), 

where the asymptotic covariance matrix X(#) is a continuous function of the 
parameter. The Wald statistic for testing (2.1) is the ratio 

(2 2) T = l{§? = ni{§? 

7 w[ 7 (0)] (V7(0)) T S(^)V7(^)' 

where the denominator of the right-most term estimates the asymptotic 
variance of 7(0), which by the delta method is given by 

(V 7 (#)) T X(#)V7(#). 

Consider now a true distribution from Hq, that is, the true parameter 
satisfies 7(0) = 0. Without loss of generality, we assume that 9 = 0. If the 
gradient is nonzero at the true parameter, then the limiting distribution of 
T 7 is the distribution of the random variable in (1.2) with a = V7(0) 7^ and 
£ = £(0). Hence, the limit is Xi- However, if V7(0) = (i.e., the constraint 
7 is singular at the true parameter), then the asymptotic distribution of 
T 7 is no longer \\ but rather given by (1.1) with the polynomial / having 
higher degree; the degree of / is determined by how many derivatives of 7 
vanish at the true parameter. 

Proposition 2.1. Assume that 7(0) = and that there is a homoge- 
neous polynomial f of degree d > 2 such that, as x — > 0, 

7 (x) = f{x) + o{\\x\f 2 ), and V 7 (x) = V/(x) + o(||x|| (d - 1)/2 ). 
Ify/riB -A A/"(0,£), thenT y -^ W /)S . 
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Example 2.2. Glonek (1993) studied testing collapsibility properties of 
2x2x2 contingency tables. Under an assumption of no three-way interaction, 
collapsibility with respect to a chosen margin amounts to the vanishing of at 
least one of two pairwise interactions, which we here simply denote by 0± and 
02- In the (#i,#2)-pl ane > the hypothesis is the union of the two coordinate 
axes, which can be described as the solution set of j(6\, #2) = $i#2 = and 
tested using the Wald statistic T 7 based on maximum likelihood estimates 
of 0\ and 02- The hypothesis is singular at the origin as reflected by the 
vanishing of V7 when 0\ = 02 = 0. Away from the origin, T 7 has the 
expected asymptotic Xi distribution. At the origin, by Proposition 2.1, T 7 
converges to W^s, where f{x) = x\X2 and E is the asymptotic covariance 
matrix of the two maximum likelihood estimates. The main result of Glonek 
(1993), stated as a theorem below, gives the distribution of Wf^ in this case. 
Glonek's surprising result clarifies that the Wald test for this hypothesis is 
conservative at (and in finite samples near) the intersection of the two sets 
making up the null hypothesis. 

Theorem 2.3 (Glonek, 1993). If f(x) = x\x 2 and E is any positive 
semidefinite 2x2 matrix with positive diagonal entries, then 

1 2 
W/,£ ~ jXv 

Before turning to concrete problems, we make two simple observations 
that we will use to bring (/, E) in convenient form. 

Lemma 2.4. Let f £ M.[xi, . . . , Xk] be a homogeneous polynomial, and let 
T, be a positive semidefinite k x k matrix with positive diagonal entries. 

(i) If c G M. \ {0} is a nonzero scalar, then W c f^ = Wf^,. 
(ii) If B is an invertible k x k matrix, then Wt oB b -i^ b -t has the same 
distribution as W/,s. 

Proof, (i) Obvious, since V(c/) = c V/. (ii) Let X ~ M(0, E) and define 

Y = B^X ~ A/"(0, B- 1 ZB- T ). 

Then f(X) = (/ o B)(Y) and V(/ o B)(Y) = B T Vf(X). Substituting 
into (1.1) gives 

w = (f°B)( Y r 

f,S (V(foB)(Y)) T B- 1 ZB- T V(foB)(Y) !°b,b-ihb-t ■ 
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3. Bivariate Monomials. In this section, we study the random vari- 
able W/e when f(x) = x^x^ 2 ■ If the exponents a±, o 2 are positive integers, 
then / is a bivariate monomial. However, all our arguments go through for 
a slightly more general case in which 0:1,0:2 are positive real numbers. Our 
main result is that the distribution of W/,e does not depend on X. 

Theorem 3.1. Let f(x) = x^x^ 2 with oi,o 2 > 0, and let £ be any 
positive semidefinite 2x2 matrix with positive diagonal entries. Then 

(ai+a 2 ) 

Proof. As shown in Section 1, the claim is true if S = (cr^-) is diagonal. 
It thus suffices to show that W^s has the same distribution as Wf := Wfj. 

By Lemma 2.4, we can assume without loss of generality that a\\ = c 22 = 
1 and p := a 12 > 0. Since 

1 a\ 2pa±a2 o 2 



2 + y, y„ + 1 



we can also assume a.\ = 1 for simplicity. With a = 1/0:2, we have 

1 1 2p 1 

+ 



and need to show that 

W& ~ (1 + 1-1)2 & 

If p = 1, then Xi and X 2 are almost surely equal and it is clear that W/e 
has the same distribution as Wf. Hence, it remains to consider < p < 1. 

Let Z\ and Z 2 be independent standard normal random variables. When 
expressing Z\ = Rcos(^) and Z 2 = i?sin(^) in polar coordinates, it holds 
that R and * are independent, and ^ is uniformly distributed over [0, 2ir]. 
Let p = sin(0) with < < vr/2, then the joint distribution of X\ and X 2 
can be represented as 

X x = R cos( tf - 0/2) , X 2 = R sin(* + 0/2) , 

which leads to 

1 1 1 

W /iS ~~R?'V 
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with 



1 _ 1 2sin(0) 1 

V ~ cos 2 (* - (f>/2) + a cos(* - <j>/2) sin(# + <j>/2) a 2 sin 2 (t + <j>/2) * 

Routine trigonometric calculations show that I" can be expressed as a func- 
tion of the doubled angle 2^. More precisely, 

r' = ^t(2\M), 

where 

2 - cos(20) + 2 cos(i/> - 0) - cos(2^) - 2 cos(^ + 0) 



t(^,0) 



1 — a cos(2c/)) + (1 + a) [a + cos(ip — (f>) — a cos(?/> + 4>)] ' 



Since 2^ is uniformly distributed on [0, 4-7r], the distribution of T" is indepen- 
dent of (j) if and only if the same is true for the distribution of T = t(ty, (f>). 
We proceed by calculating the moments of T and show that they are 
independent of <f>. For each < </> < vr/2, there exists a small interval L = 
[4> — e, 4> + e] such that when m > 1, the function 

supi^^r" 1 ^^,^ 

is integrable over < ip < 2tt. Therefore, we have 

The expression of jn;t(ip, (j)) is long, so we omit it here. 

We introduce the complex numbers z = e 1 ^ and a = e _l< ^, and express 
the functions t(ip, 4>) and -§rt{ij), <ft) in terms of z and a: 

,, , ,, ( v (a-z) 2 (l + az) 2 



z(a + aa + a 2 z — erz)(— 1 + a 2 cr — az — aaz 
-§$t(ip,4>) =v(z,a) = 



a(a — z)(l + az)(l + a 2 s + 2az — 2aaz + a 2 z 2 + crz 2 ) 
iz(a + aa + a 2 z — o"z) 2 (— 1 + a 2 <r — az — aaz) 2 



x (—a — aa — z — a z + az + a az — az — aaz ). 

The integral in (3.1) can be computed as a complex contour integral on the 
unit circle T = {z : \z\ = 1} 

"[i(V^)r- 1 |-t(V^)# = l[u{z,a)} m - l v{z,a)-dz. 
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Let 

q(z,a) = [u(z,a)] m ~ v(z,a) — . 

iz 

As a function of z, it has two poles within the unit disc. These two poles are 

at zq = and z\ = (a 2 <r — l)/(a + aa) and have the same order m + 1. By 

the Residue Theorem, we know 

(3.2) d> q(z, a) dz = Res(o; 0) + Res(g; z\) : 

2m J T 

where Res(g; 0) and Res(q i ; Z\) are the residues at and z\ respectively. Let 
Co = {ce 1 ^, < tp < 2tt} be a small circle around such that z\ is outside 
the circle. Let S be the Mobius transform 

z 1 -w 
S(w) 



1 — Z\W 



Then S is one-to-one from the unit disk onto itself and maps to z\, and 
Co to a closed curve £i = {S(ce 1 ^), < ip < 2tt} around z\ with winding 
number one. It holds that 

Res(q;zi) = - — -. (p q(z,a)dz = - — : (p q(S (w) , a) S' (w) dw . 
l™ J Cl 2iri J Co 

It also holds that 

q(S(w),a)S'(w) = -q(w,a), 

from which we deduce 

(p q(S (w) , a) S' (w) dw = ® q(w , a) dw = — Kes(q; 0) . 

Hence, the integral in (3.2) is zero. 

We have shown that the integral in (3.1) is zero for every m > 1, which 
means that the moments of T do not depend on <f> for < <p < tt/2. When 
<p = 0, the random variable T is bounded, so its moments uniquely determine 
the distribution. Therefore, the distribution of T does not depend on (p, and 
the proof is complete. □ 

Remark 3.2. If a± = Q2, then Theorem 3.1 reduces to Theorem 2.3. In 
this case, our proof above would only need to treat a = 1. Glonek's proof 
of Theorem 2.3 finds the distribution function of a random variable related 
to our T. If a = 1, this requires solving a quadratic equation. When a/ 1, 
we were unable to extend this approach as a complicated quartic equation 
arises in the computation of the distribution function. We thus turned to 
the presented method of moments. 
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Let X = (X 1 ,X 2 ) T and Y = (Yi,Y 2 ) T be two independent W 2 (0, S) ran- 
dom vectors, where E has positive diagonal entries. Let pi,P2 be nonnegative 
numbers such that p± + p 2 = 1. The random variable 

p 1 X 2 Y 1 +p 2 X 1 Y 2 



Q 



y/(j>lX 2 ,p 2 X 1 )E(p 1 X 2 ,p 2 X 1 ) T 



has the standard normal distribution, and is independent of X. For f(x) 

~Pl~P2 w 



yjr,E 



V /(V/(X))^SV/(X) 
and W/ )E = Vjp. Then 
,, ox Y x Y 2 Q 

(3-3) PlT^+P2" 



Xi X 2 Vf^ 

By taking the conditional expectation given V/s, the characteristic function 
of (3.3) is seen to be 

E [exp{itg/F /iS }] = E [exp{-it 2 /^/,s}] ■ 

The uniqueness of the moment generating function for positive random vari- 
ables (Billingsley, 1995, Thm. 22.2) yields that (3.3) has a standard Cauchy 
distribution (with characteristic function e - '*') if and only if W/e ~ Xi- 
Therefore, we have the following equivalent version of Theorem 3.1. 

Corollary 3.3. Let X = (Xi,X 2 ) T and Y = (Yi,Y 2 ) T be independent 
ftf 2 (Q, S) random vectors, where £ has positive diagonal entries. Ifpi,p 2 are 
nonnegative numbers such that pi + p 2 = 1, then the random variable 

Y\ , Y 2 

P1 x; +P2 T 2 

has the standard Cauchy distribution. 

4. Quadratic Forms. In this section, we consider the distribution of 
Wf^ when / is a quadratic form, that is, 

f{xi,x 2 ,...,x k ) = ^2 a'ijXiXj 

l<i<j<k 

for real coefficients a'^. Equivalently, 

(4.1) f(xi,x 2 ,... ,Xk) = x T Ax, 

where A = (ay) is symmetric, with an = a^ and ay = ay/2 for i < j. 



10 M. DRTON AND H. XIAO 

4.1. Canonical form. Let I denote the kxk identity matrix. We use the 
shorthand 

W f := W fJ 

when the covariance matrix £ is the identity. 

Lemma 4.1. If f £ M.[x\, . . . , x^\ is homogeneous of degree d and £ is a 
positive semidefinite kxk matrix, then W/ s has the same distribution as W g 
where g is a homogeneous degree d polynomial in rank(£) many variables. 

Proof. If £ has full rank then S = BB for an invertible matrix B. Use 
Lemma 2.4(h) to transform W/£ to W g where g = f o B is homogeneous 
of degree d. If £ has rank m < k then £ = BE m B T , where B is invertible 
and E m is zero apart from the first m diagonal entries that are equal to one. 
Form g by substituting x m+ i = ■ ■ ■ = x\. = into / o B. □ 

Further simplications are possible for a treatment of the random variables 
Wf. In the case of quadratic forms, we may restrict attention to canonical 
forms f(x) = Aixf + • • • + A^x|, as shown in the next lemma. 

Lemma 4.2. Let f(x) = x Ax be a quadratic form given by a symmetric 
kxk matrix A ^ 0. If E is a positive definite kxk matrix and Ai, . . . , A^ 
are the eigenvalues of AH, then W^s has the same distribution as 

U9 , {\,zl + • • • + \ k zl) 2 

[ - } A{\\zi + ... + \iziy 

where Z\, . . . ,Zf. are independent standard normal random variables. 

Proof. Write £ = BB T for an invertible matrix B. By Lemma 2.4(h), 
W/ s nas the same distribution as W g with g(x) = x T {B T AB)x. Let 

Q T (B T AB)Q = dmg(\ 1 ,...,\ k ) 

be the spectral decomposition of B AB, with Q orthogonal. Then Ai, . . . , A& 
are also the eigenvalues of AT,. Applying Lemma 2.4(h) again, we find that 
W/ £ has the same distribution as Wi with 

h(x) = x T (Q T B T ABQ)x = AiS? + • • • + A fe x|. 

Since V/i(x) = 2(AiXi, . . . , XkXk), the claim follows. □ 
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In (4.2), the set of eigenvalues {Aj : 1 < i < k} can be scaled to {cAj : 
1 < i < k} for any c / 0, without changing the distribution; recall also 
Lemma 2.4(i). For instance, we may scale one nonzero eigenvalue to become 
equal to one. When all (scaled) Aj are in { — 1,1}, the description of the 
distribution of Wf^ can be simplified. We write Beta(a, j3) for the Beta 
distribution with parameters a, (3 > 0. 

Lemma 4.3. Let k\ and k 2 be two positive integers, and let k = k\ + k 2 . 

\ 1+ i — x\ i+k2 , then Wf has the same 



If f(x 1 ,... ,x k ) = 
distribution as 


= x\ + ■ 


"r x k 1 x fei+l 

-R 2 (2B - l) 2 

4 



where R 2 and B are independent, R 2 ~ xl> an d B ~ Beta(ki/2,k2/2). 
Proof. The distribution of Wf is that of 

J k 1 +k 2 ' 



i{z 2 + --- + zl-zl +1 -z 2 



4 Z 2 + ... + Z 2 

with Zi,...,Zk independent and standard normal. Let 

Yl :=Z 2 + ... + Z 2 ki ~ x 2 t , Y 2 :=Z 2 kl+1 + ... + Z 2 ~ X t 2 - 

Then R 2 := Y\ + Yjj ~ x\- Representing Z\,..,,Z^ in polar coordinates 
shows that R 2 and Wf/R 2 are independent (Muirhead, 1982, Thm. 1.5.5). 
Since B = Y l /(Y l + Y 2 ) ~ Beta(A: 1 /2, k 2 /2), and (Yi - Y 2 ) 2 /(Y 1 + Y 2 ) 2 = 
(2B - l) 2 , we deduce that the two random variables Wf/R 2 and \{2B — l) 2 
have the same distribution. □ 

We note that when k = 4 and k\ = k 2 = 2, Lemma 4.3 gives the equality 
of distributions 

(4.3) Wf = \R 2 U 2 . 

The equality holds because, in this special case, U = Y\/(Yi +Y 2 ) is uni- 
formly distributed on [0, 1], and {2U — l) 2 has the same distribution as U 2 . 
The distribution from (4.3) will appear in Section 5. 

For general eigenvalues Aj, it seems that the distribution from (4.2) cannot 
be described in as simple terms. 
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4.2. Classification of bivariate quadratic forms. We now turn to the bi- 
variate case (k = 2), that is, we are considering a quadratic form in two 
variables, 

f(x\, X2) = axi + 2bx\X2 + cx 2 . 

In this case, we are able to give a full classification of the possible distri- 
butions of Wf in terms of linear combinations of a pair of independent \\ 
random variables; see Johnson, Kotz and Balakrishnan (1994, Sect. 18.8) for 
a discussion of such distributions. Our classification reveals that for k = 2 
the distributions for quadratic forms are stochastically bounded below and 
above by Xi/4 and x!/4> respectively. 

Theorem 4.4. Let £ be a positive definite matrix, and let f{x\,X2) = 
ax\ + 2bx\X2 + cx\ be a nonzero quadratic form with matrix 

^(j !)^ 

(a) Ifb 2 - ac > 0, then W ft x ~ Xi/4- 
(fe) Ifb 2 -ac<0, then 



4 V tr(,4£) 2 

where Z\ and Z2 are independent standard normal random variables. 

Proof, (a) When the discriminant b 2 — ac > then / factors into a 
product of two linear forms. The joint distribution of the two linear forms is 
bivariate normal. Write £' for the covariance matrix of the linear forms then 
the distribution of Wf is equal to the distribution of W g ^> with g(x±, X2) = 
x\X2- Hence, the distribution is Xi/4 by Theorem 2.3/Theorem 3.1. 

(b) In this case, the discriminant is negative and / does not factor. By 
Lemma 4.2, we can assume E = I and consider the distribution of Wf for 
f(x\,X2) = X±x 2 + X2X 2 , where Ai and A2 are the eigenvalues of AT,. Since 
det(AS) = A1A2 and tr(AS) = Ai + A2, to prove the claim, we must show 
that in this case 

where c = A2/A1 > 0. 

To show (4.4) we use the polar coordinates again. So represent the two 
considered independent standard normal random variables as X\ = R cos^) 
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and X2 = i?sin(^'), where R 2 ~ x\ an d ^ ~ Uniform[0, 2-k] are independent. 
Then 



21 



R 2 [cos(*) 2 + csin(^) 

r = • 

4 cos(^) 2 + c 2 sin(^) 2 
i? 2 ( (l-c) 2 (l + c) 2 cos(*) 2 sin(#) 2 



TT" 



4 \ (1 + c) 2 cos(^) 2 + c 2 sin(^) 2 
Using Lemma 4.5, we have 

\2 



-i-^-m^) 



(4.5) = _^___ cosW 2 + sinW 

This is the claim from (4.4) because Rcos(^/) and i?sin(^) are independent 
and standard normal. □ 

Lemma 4.5. If c > and \& has a uniform distribution over [0, 2ir], then 

Qfm (l + c) 2 cosW 2 sin(lQ 2 d 2 

6 C (W) := ——^ „ . . T . = cos(W) . 

cos(*) 2 + c 2 sin($) 2 v ; 

Proof. Let R 2 ~ x\ be independent of \&. Then i?sin(^) and Rcos(^>) 
are independent and standard normal. Therefore, 

111 c 2 1 

+ 



R 2 S C (^) (1 + c) 2 [i?sin(^)] 2 (1 + c) 2 [Rcos(V)] 2 

is the sum of two independent random variables that follow the one-sided 
stable distribution of index ^. Since c > 0, the first summand has the stable 
distribution with parameter 1/(1 + c) and the second summand has parame- 
ter c/(l + c). Hence, by (1.4), their sum follows a stable law with parameter 
1. Expressing this in terms of the reciprocals, 

R 2 S C {^) = R 2 cos(^) 2 ~ xl 

It follows that S c (ty) has the same distribution as cos(^) 2 . For instance, we 
may argue that S c (^) and cos^) 2 have identical moments, which implies 
equality of the distributions as both are compactly supported. □ 

We remark that the claim of Lemma 4.5 is false for c < 0. Indeed, the 
distribution of S c (^) varies with c when c < 0. 
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4.3. Stochastic bounds. To understand possible conservativeness of Wald 
tests it is interesting to look for stochastic bounds on W/s that hold for all 
/ and E. We denote the stochastic ordering of two random variables as 
U < st V when P(U > t) < P(V > t) for all t G R. 

Proposition 4.6. If f G M[xi,...,Xfc] is a quadratic form and E any 
nonzero positive semidefinite k x k matrix, then Wf£ < s t jXfe- Equality is 
achieved when f(x) = x\ + • • • + x\ and E is the identity matrix. 

Proof. The second claim is obvious. For the first claim, without loss of 
generality, we can restrict our attention to the distributions from (4.2). The 
Cauchy-Schwarz inequality gives 

[\ X Z\ + ■■■ + X k Zlf (ZI + --- + 4) (A^? + • • • + \\Zl) 



< 



4 (A?Z? + ■ ■ ■ + A|22) - 4(A^ + -.. + A 

1 
4 



Z(z* + ... + Z*) 



which is the desired chi-square bound. D 

The considered Wald test rejects the hypothesis that ^(9) = when the 
statistic T 7 from (2.2) exceeds c a , where c a is the (1 — a) quantile of the 
Xi distribution. Let k a be the largest degrees of freedom k such that a 
jXk ran dom variable exceeds c a with probability at most a. According to 
Proposition 4.6, if the true parameter is a singularity at which 7 can be 
approximated by a quadratic form in at most k a variables, then the Wald 
test is guaranteed to be asymptotically conservative. Some values are 

&0.05 = 7, fco.025 = 11, feo.01 = 16, A; .oo5 = 20, /c .ooi = 29 - 
Turning to a lower bound, we can offer the following simple observation. 

Proposition 4.7. Suppose the quadratic form f is given by a symmetric 
k x k matrix A 7^ 0, and suppose that E is a positive definite k x k matrix 
such that all eigenvalues of AT, are nonnegative. Then Wf t x >st \x\- 

Proof. Let Ai, . . . , A& > be the eigenvalues of AT. By scaling, we can 
assume without loss of generality that Ai = 1 and < Aj < 1 for 2 < i < k. 
Then 

(A^ + .-. + A^) 2 Zl (XJZt + • • • + XJZl) 1 

4(A?Z? + - + A^) " 4(A?Z? + .» + A^) "4 l5 

and the claim follows from Lemma 4.2. D 
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Proposition 4.7, Theorem 4.4 and simulation experiments lead us to con- 
jecture that jXi is still a stochastic lower bound when there are both positive 
and negative eigenvalues Aj. 

Conjecture 4.8. For any quadratic form / 7^ and any positive semidef- 
inite matrix S/0, the distribution ofWf^ stochastically dominates -jx\- 

While we do not know how to prove this conjecture in general, we are 
able to treat the special case where the eigenvalues Aj are either 1 or -1. 

Theorem 4.9. Let ki,k 2 > 0, and k = k\ + k 2 . If f(x\, . . . ,x k ) = 

x l + ■■■ + < - 4 1+ i - 4 1+ k 2 > then w f >st hi 

Proof. Without loss of generality we assume k\ < k 2 - If k\ = or 
k\=k 2 = 1, the claim follows Proposition 4.7 and Theorem 4.4, respectively. 
We now consider the case k\ > 1 and k 2 > 2. By Lemma 4.3, we know 



(4.6) W f = \R 2 {2B - 1 



,2 



where R 2 and B are independent, R 2 ~ x|> arid B ~ Beta(/ci/2, k 2 /2). On 
the other hand, if B' ~ Beta(l/2, (k — l)/2) and is independent of R 2 , then 

(4.7) R 2 B' ~ xl 

Let g(x) and h{x) be the density functions of (2B — l) 2 and B' , respectively. 
The comparison of (4.6) and (4.7) shows that it suffices to prove that (2B — 
l) 2 is stochastically larger than B' . We will show a stronger result, namely, 
that the likelihood ratio g(x)/h(x) is an increasing function over [0, 1]. 
To simplify the argument, we rescale the density functions to 

g{x)^ oc (l + ^i/ 2 "^!-^)^-! 

+ (l-v / ^) fel/2 - 1 (l + v / ^) fe2/2 " 1 
and 

h(x)V^ oc (1 - x) (fc " 3)/2 

= (1 _ ^) (fcl+fc2 " 3)/2 (l + v^) (fcl+fe2 ~ 3)/2 . 

For our purpose, it is equivalent to show the monotonicity of g(x 2 )/h(x 2 ), 
which is proportional to 

£(x) := (l + x)(- fc2+1 )/ 2 (l-rE)(- fcl+1 )/ 2 

+ (l-s)H*+l)/2( 1 + a .)(-*l+l)/2. 
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When k\ = 1, the derivative of £(x) satisfies 

2£'(x) = (k 2 - 1)(1 - x) ( " fc2 ~ 1)/2 - (k 2 - 1)(1 + x)(- fc2 ^/ 2 > 

when < x < 1, and thus the likelihood ratio is an increasing function. 
When k\ > 2, we have 

2£\x)(l + xf 2+1 ^ 2 (l-x)^ +1 ^ 2 

= (1 + X) \{k 2 - 1)(1 + x)( fc2 - fel )/ 2 + (fci - 1)(1 - X )(fa-*l)/2' 



(1-x) (fci-l)(l + X 



,(fc 2 -fcl)/2 



+ 



1)(1-X 



,(fc 2 -fcl)/2 



> (*2 " fcl) [(1 + x) (fc2 - fcl)/2 - (1 - x)( fc 2- fc i)/ 2 



> 







for all < x < 1. Therefore, £(x) is an increasing function. 



□ 



5. Tetrads. We now turn to the problem that sparked our interest in 
Wald tests of singular hypothesis, namely, the problem of testing tetrad 
constraints on the covariance matrix G = (9ij) of a random vector Y in MP 
with p > 4. A tetrad is a 2 x 2 subdeterminant that only involves off-diagonal 
entries and, without loss of generality, we consider the tetrad 



(5.1) 



7 (9) = 6 13 9 



21 



'14023 



det 



#13 0u 
#23 #24 



Example 5.1. Consider a factor analysis model in which the coordi- 
nates of Y are linear functions of a latent variable X and noise terms. More 
precisely, Yi = /3oj + fiiX + ej where X ~ -A/"(0, 1) is independent of ei, . . . ,e p , 
which in turn are independent normal random variables. Then the covari- 
ance between Yi and Yj is 9ij = /3i[3j and the tetrad from (5.1) vanishes. 

Suppose now that we observe a sample of independent and identically 
distributed random vectors Y' 1 ', . . . , Y^ n > with covariance matrix 0. Let Y n 
be the sample mean vector, and let 



e 



i]T(Y« -F n )(Y« -Y n f 



be the empirical covariance matrix. Assuming that the data-generating dis- 
tribution has finite fourth moments, it holds that 



v^(e-e) A A4(o,y(e)) 
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with k = p 2 . The rows and columns of the asymptotic covariance ma- 
trix V(G) are indexed by the pairs ij := (i,j), 1 < i,j < P- Since the 
tetrad from (5.1) only involves the covariances indexed by the pairs in 
C = {13, 14, 23, 24}, only the principal submatrix 

S(6) := V(e) C xC 

is of relevance for the large-sample distribution of the sample tetrad 7(6). 
The gradient of the tetrad is 



v 7 (e) 



724, 



7 23, 



'14, P13J 



Hence, if at least one of the four covariances in the tetrad is nonzero the 
Wald statistic T 7 converges to a \\ distribution. If, on the other hand, 
#13 = #14 = #23 = #24 = 0, then the large-sample limit of T 7 has the 
distribution of Wf^fQ) where 



m 



X\X^ - x 2 x 3 



is a quadratic form in k = 4 variables; recall Proposition 2.1. This form can 
be written as x T Ax with a matrix that is a Kronecker product, namely 



(5.2) 



.4 



f° 








A 












-1 


-1 








-(-xiH 


^0 1 

v -i 


V 








0) 







If Y is multivariate normal, then the asymptotic covariance matrix has 
the entries 

V(@)ij,ki = OikOji + OuOjk- 
In the singular case with #13 = #14 = #23 = #24 = 0, we have thus 



S(9) 



/#11#33 #11#34 #12#33 #12#34\ 

#11#34 #11#44 #12#34 #12#44 

#12#33 #12#34 #22#33 #22#34 

\#12#34 #12 #44 #22 #34 #22 #44/ 



'11 
h2 



^12 
?22 



'33 
'34 



734 

hi 



which again is a Kronecker product. We remark that X(O) would also be a 
Kronecker product if we had started with an elliptical distribution instead 
of the normal, compare Iwashita and Siotani (1994, eqn. (2.1)), or if (Y±, Y2) 
and (>3, Y4) were independent in the data-generating distribution. 

As we show next, in the singular case, the Kronecker structure of the two 
matrices A and X(O) gives a limiting distribution of the Wald statistic for 
the tetrad that does not depend on the block-diagonal covariance matrix 0. 



18 M. DRTON AND H. XIAO 

Theorem 5.2. Let E = E^ <g> E( 2 ) 6e the Kronecker product of two 
positive definite 2x2 matrices SW, E^ 2 '. Lei /(x) = X1X4 — X2X3. Then 

where R 2 ~ x| an( ^ f ~ Uniform [0, 1] are independent. 

Proof. Since / is a quadratic form we may consider the canonical form 
from Lemma 4.2, which depends on the (real) eigenvalues of AS. The claim 
follows from Lemma 4.3 and the comments in the paragraph following its 
proof provided the four eigenvalues of AT, all have the same absolute value, 
two of them are positive and two are negative. 

Let EW = (0$). Then, by (5.2), 

l-4 } ^7® US <J8)- 

For i = 1,2, since E" is positive definite, the matrix 

(-41 4A 

has the imaginary eigenvalues 

±A« = ±y/ (*$)*-*$*$. 
It follows that AS has the real eigenvalues 

\^\W and -A«A( 2 \ 
each with multiplicity two. Hence, Lemma 4.3 applies with k\ = k<i = 2. □ 
The distribution function of jR 2 U 2 is 

i ? sin g (£) = l-e- 2i + \/2^(l-$(2Vt)), t>0, 

where <3?(i) is the distribution function of A/"(0, 1). The density /sing (i) of 
^R 2 U 2 is strictly decreasing on (0, 00) and f S mg(t) — > 00 as t — > 0. In light 
of Theorem 4.4, it is interesting to note that the distribution of jR 2 U 2 is 
not the distribution of a linear combination of four independent xf random 
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variables, because the x\ distribution has a finite density at zero when d > 2. 
However, the distribution satisfies 

4X1 Sst i-K t; <at 4X2- 

The first inequality holds according to Theorem 4.9. The second inequality 
holds because R 2 U ~ %\. According to the next result, the distribution is 
also no larger than a x\ distribution, which means that the Wald test of a 
tetrad constraint is asymptotically conservative at the tetrad's singularities 
(which are given by block-diagonal covariance matrices). 

Proposition 5.3. Suppose R 2 ~ xl an d U ~ Uniform[0, 1] are inde- 
pendent. Then 

\R 2 U 2 < st xl 

Proof. Let Z\,...,Z^ be independent standard normal random vari- 
ables. Then the sum of squares 

Z 2 + Z 2 + Z 2 + Z 2 I R 2 ~ xl 

and the ratio 

£1 Betafi ^ 

Z 2 + Z 2 + Z 2 + Z 2 aeta ^2) 

are independent. Hence, the claim holds if and only if 

\U < st v^B, 

where U ~ Uniform[0, 1] and B ~ Beta (5, |). The distribution of U/2 is 
supported on the interval [0, 1/2] on which it has distribution function 

F u/2 (t) = 2t 
For t £ (0, 1), the distribution function of \B has first and second derivative 

F > (t) = ^EEf. and F"(t) = ij — . 

Hence, -F/g is strictly concave on (0, 1) and has a tangent with slope 4/tt < 2 
at t = 0. Consequently, 

ify 2 (*) > ^(*). tGR ' 

giving the claimed ordering of -gR 2 U 2 and the xi distribution. □ 
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6. Conjectures. In Section 3, we mentioned that Theorem 3.1 and 
Corollary 3.3 are equivalent. Similarly, Conjecture 1.1 is equivalent to the 
following one. 

Conjecture 6.1. LetX = (X 1 ,X 2 , . . . ,X k ) T andY = (Y 1 ,Y 2 , . . . ,Y k ) T 
be independent and have the same distribution A/fc(0,£), where £ has pos- 
itive diagonal entries. If pi,P2, ■ ■ ■ ,Pk are nonnegative numbers such that 
Pi + P2 H h Pk = I; then 

P\Y\ P2Y2 PkY k 

X\ X2 X), 

has the standard Cauchy distribution. 

For a proof of this conjecture it is natural to try an induction type argu- 
ment, which might involve the ratio of normal random variables with nonzero 
means (Marsaglia, 1965). However, we were unable to make this work. 

By taking the reciprocal of WVej we can translate Conjecture 1.1 into 
another equivalent form. 

Conjecture 6.2. Let X = (Xi,X 2 , ■ ■ ■ ,X k ) T ~ A4(0, S) be such that 
non of its entries is a point mass. If p±,p 2 , . . . ,p n ar & nonnegative numbers 
such that p\ + p 2 + ' " ' + Pn = 1) then 



r^j 



Cfi-n ( Pl P2 Pn\ v ( Pl P2 Pn 

Simulation provides strong evidence for the validity of these conjectures. 
We have tried many randomly generated scenarios with 2 < k < 5, simulat- 
ing large numbers of values for the rational functions in question. In all cases 
empirical distribution functions were indistinguishable from the conjectured 
xf or Cauchy distribution functions. 

On the other hand, the positivity requirement for pi,p 2 , ■ ■ ■ ,Pk is crucial 
for the validity of the conjectures. For instance, let Q be the reciprocal of 
the quantity on the left hand side of (6.1), and consider the special case 
where k = 2, var(Xi) = var(X2) = 1, cot(X±, X 2 ) = p, and p\ = — p 2 = 1/2. 
Assuming that \p\ < 1, change coordinates to 

Zi = {X 1 + X 2 )/y/2(l + p), Z 2 = {X 1 - X 2 )/y/2(l-p), 

and then to polar coordinates Z\ = Rcos$> and Z 2 = i?sin^. We obtain 
that 

Q = \xJ-x^x 2 + xl) =R i-p* ■ 
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The distribution of Q now depends on p. For instance, 



E[Q] 



1-p 2 



7. Conclusion. In regular settings, the Wald statistic for testing a con- 
straint on the parameters of a statistical model converges to a Xi distribu- 
tion as the sample size increases. When the true parameter is a singularity of 
the constraint, the limiting distribution is instead determined by a rational 
function of jointly normal random variables (recall Section 2). The distribu- 
tions of these rational functions are in surprising ways related to chi-square 
distributions as we showed in our main results in Sections 3-5. 

Our work led to several, in our opinion, intriguing conjectures about the 
limiting distributions of Wald statistics. Although the conjectures can be 
stated in elementary terms, we are not aware of any other work that suggests 
these properties for the multivariate normal distribution. 

For quadratic forms, the usual canonical form leads to a particular class of 
distributions parametrized by a collection of eigenvalues (recall Lemma 4.2). 
It would be interesting to study Schur convexity properties of this class of 
distributions, which would provide further insights into asymptotic conser- 
vativeness of Wald tests of singular hypotheses. 

Finally, this paper has focused on testing a single constraint. It would 
be interesting to develop a general theory for Wald tests of hypotheses that 
are defined in terms of several constraints. In this setting the choice of the 
constraints representing a null hypothesis will play an important role in the 
distribution theory, as exemplified by Gaffke, Steyer and von Davier (1999) 
and Gaffke, Heiligers and Offinger (2002). 
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